Goto

Collaborating Authors

 auc loss


Robust low-rank estimation with multiple binary responses using pairwise AUC loss

arXiv.org Machine Learning

Multiple binary responses arise in many modern data-analytic problems. Although fitting separate logistic regressions for each response is computationally attractive, it ignores shared structure and can be statistically inefficient, especially in high-dimensional and class-imbalanced regimes. Low-rank models offer a natural way to encode latent dependence across tasks, but existing methods for binary data are largely likelihood-based and focus on pointwise classification rather than ranking performance. In this work, we propose a unified framework for learning with multiple binary responses that directly targets discrimination by minimizing a surrogate loss for the area under the ROC curve (AUC). The method aggregates pairwise AUC surrogate losses across responses while imposing a low-rank constraint on the coefficient matrix to exploit shared structure. We develop a scalable projected gradient descent algorithm based on truncated singular value decomposition. Exploiting the fact that the pairwise loss depends only on differences of linear predictors, we simplify computation and analysis. We establish non-asymptotic convergence guarantees, showing that under suitable regularity conditions, leading to linear convergence up to the minimax-optimal statistical precision. Extensive simulation studies demonstrate that the proposed method is robust in challenging settings such as label switching and data contamination and consistently outperforms likelihood-based approaches.


Online AUC Optimization Based on Second-order Surrogate Loss

arXiv.org Artificial Intelligence

The Area Under the Curve (AUC) is an important performance metric for classification tasks, particularly in class-imbalanced scenarios. However, minimizing the AUC presents significant challenges due to the non-convex and discontinuous nature of pairwise 0/1 losses, which are difficult to optimize, as well as the substantial memory cost of instance-wise storage, which creates bottlenecks in large-scale applications. To overcome these challenges, we propose a novel second-order surrogate loss based on the pairwise hinge loss, and develop an efficient online algorithm. Unlike conventional approaches that approximate each individual pairwise 0/1 loss term with an instance-wise surrogate function, our approach introduces a new paradigm that directly substitutes the entire aggregated pairwise loss with a surrogate loss function constructed from the first- and second-order statistics of the training data. Theoretically, while existing online AUC optimization algorithms typically achieve an $\mathcal{O}(\sqrt{T})$ regret bound, our method attains a tighter $\mathcal{O}(\ln T)$ bound. Furthermore, we extend the proposed framework to nonlinear settings through a kernel-based formulation. Extensive experiments on multiple benchmark datasets demonstrate the superior efficiency and effectiveness of the proposed second-order surrogate loss in optimizing online AUC performance.


FROC: Building Fair ROC from a Trained Classifier

arXiv.org Artificial Intelligence

This paper considers the problem of fair probabilistic binary classification with binary protected groups. The classifier assigns scores, and a practitioner predicts labels using a certain cut-off threshold based on the desired trade-off between false positives vs. false negatives. It derives these thresholds from the ROC of the classifier. The resultant classifier may be unfair to one of the two protected groups in the dataset. It is desirable that no matter what threshold the practitioner uses, the classifier should be fair to both the protected groups; that is, the $\mathcal{L}_p$ norm between FPRs and TPRs of both the protected groups should be at most $\varepsilon$. We call such fairness on ROCs of both the protected attributes $\varepsilon_p$-Equalized ROC. Given a classifier not satisfying $\varepsilon_1$-Equalized ROC, we aim to design a post-processing method to transform the given (potentially unfair) classifier's output (score) to a suitable randomized yet fair classifier. That is, the resultant classifier must satisfy $\varepsilon_1$-Equalized ROC. First, we introduce a threshold query model on the ROC curves for each protected group. The resulting classifier is bound to face a reduction in AUC. With the proposed query model, we provide a rigorous theoretical analysis of the minimal AUC loss to achieve $\varepsilon_1$-Equalized ROC. To achieve this, we design a linear time algorithm, namely \texttt{FROC}, to transform a given classifier's output to a probabilistic classifier that satisfies $\varepsilon_1$-Equalized ROC. We prove that under certain theoretical conditions, \texttt{FROC}\ achieves the theoretical optimal guarantees. We also study the performance of our \texttt{FROC}\ on multiple real-world datasets with many trained classifiers.


Robust CLIP-Based Detector for Exposing Diffusion Model-Generated Images

arXiv.org Artificial Intelligence

Diffusion models (DMs) have revolutionized image generation, producing high-quality images with applications spanning various fields. However, their ability to create hyper-realistic images poses significant challenges in distinguishing between real and synthetic content, raising concerns about digital authenticity and potential misuse in creating deepfakes. This work introduces a robust detection framework that integrates image and text features extracted by CLIP model with a Multilayer Perceptron (MLP) classifier. We propose a novel loss that can improve the detector's robustness and handle imbalanced datasets. Additionally, we flatten the loss landscape during the model training to improve the detector's generalization capabilities. The effectiveness of our method, which outperforms traditional detection techniques, is demonstrated through extensive experiments, underscoring its potential to set a new state-of-the-art approach in DM-generated image detection. The code is available at https://github.com/Purdue-M2/Robust_DM_Generated_Image_Detection.


Training Differentially Private Ad Prediction Models with Semi-Sensitive Features

arXiv.org Artificial Intelligence

Motivated by problems arising in digital advertising, we introduce the task of training differentially private (DP) machine learning models with semi-sensitive features. In this setting, a subset of the features is known to the attacker (and thus need not be protected) while the remaining features as well as the label are unknown to the attacker and should be protected by the DP guarantee. This task interpolates between training the model with full DP (where the label and all features should be protected) or with label DP (where all the features are considered known, and only the label should be protected). We present a new algorithm for training DP models with semi-sensitive features. Through an empirical evaluation on real ads datasets, we demonstrate that our algorithm surpasses in utility the baselines of (i) DP stochastic gradient descent (DP-SGD) run on all features (known and unknown), and (ii) a label DP algorithm run only on the known features (while discarding the unknown ones).